-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40473][SQL] Migrate parsing errors onto error classes #37916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
srielau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very cool!
I have some general questions, observations:
- You seem to assume < 1000 of these. But just this one PR consumes close to a hundred slots"
- How do we keep the numbering straight for the next n PRs? I assume you used some tooling to whip this up so fast. So this was 0 => 1. How do we get from n => n + 1?
|
|
||
| /**/ | ||
| { | ||
| "errorClass" : "_LEGACY_ERROR_TEMP_055" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we loose the context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We don't provide context in many places (I mean QueryContext not line, pos). This is one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case, users will see the same error message by default (not JSON).
Some time ago, I have counted the total number of exceptions to be ported onto error classes around 800. I don't see any problems to start using 4 digits after 999.
We have 3 types of errors, actually: parsing, compilation and execution. We could migrate them 1-by-1 using sequential numbers. Otherwise the PR will conflict a lot.
I think it will be faster to manually port the exceptions without any tooling. |
| } catch { | ||
| case e: NumberFormatException => | ||
| throw new ParseException(e.getMessage, ctx) | ||
| throw new ParseException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also mention it in readme.md or somewhere: we don't have to instantiate the exception instance in a central file if it's only used once. This can also simplify the error migration work.
|
Can we update |
| 4. Check if the exception type already extends `SparkThrowable`. | ||
| If true, skip to step 6. | ||
| 5. Mix `SparkThrowable` into the exception. | ||
| 5. Mix `SparkThrowable` into the exception, and place it to Query.*Errors at least when it is invoked more than once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can change item 6 instead
Throw the exception with the error class and message parameters. If the same exception is thrown in several places, create a util function in a central place such as `QueryCompilationErrors.scala` to instantiate the exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
@dongjoon-hyun @viirya @HyukjinKwon The new error class name prefix |
|
|
||
| Error classes are a succinct, human-readable representation of the error category. | ||
|
|
||
| An uncategorized errors can be assigned to a legacy error class with the prefix _LEGACY_ERROR_TEMP_ and an unused sequential number, for instance _LEGACY_ERROR_TEMP_053. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe we should using \ in front of _LEGACY_ERROR_TEMP_ as an escape character ??
Otherwise, maybe it should look something like: LEGACY_ERROR_TEMP in the documents, which is missing leading and following underscore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will mark it as code. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
|
@gatorsmile @cloud-fan @HyukjinKwon @gengliangwang @dongjoon-hyun @viirya I would like to merge this if there are no objections from your side. |
| throw new ParseException( | ||
| errorClass = "_LEGACY_ERROR_TEMP_0061", | ||
| messageParameters = Map("msg" -> e.getMessage), | ||
| ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should we do for re-throws? @MaxGekk @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, we check either the exception belongs to an error class or not. If so, just re-throw it as is otherwise assign an error class. But in this PR, I just want to assign error classes and don't want to change existing behaviour.
|
Merging to master. Thank you, @cloud-fan @itholic @entong @srielau for review. |
### What changes were proposed in this pull request? In the PR, I propose to use `checkError()` to check the error class, message parameters and the query context. After the PR #37916, all parsing exceptions should have an error class. ### Why are the changes needed? 1. Checking of the error classes plus the query context improves the test coverage. 2. Eliminating the dependency of error messages. So, tech editors will be able to modify `error-classes.json` w/o fixing the test suite. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the modified test suite: ``` $ build/sbt "test:testOnly *ErrorParserSuite" ``` Closes #38204 from MaxGekk/intercept-parsing-error-class. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>
What changes were proposed in this pull request?
In the PR, I propose to migrate all parsing errors onto temporary error classes with the prefix
_LEGACY_ERROR_TEMP_. The error message will not include the error classes, so, in this way we will preserve the existing behaviour.Why are the changes needed?
The migration on temporary error classes allows to gather statistics about errors and detect most popular error classes.
Does this PR introduce any user-facing change?
No. The error messages should be almost the same by default.
How was this patch tested?
By running the modified test suites: